107 research outputs found
Analysis of the Relationships among Longest Common Subsequences, Shortest Common Supersequences and Patterns and its application on Pattern Discovery in Biological Sequences
For a set of mulitple sequences, their patterns,Longest Common Subsequences
(LCS) and Shortest Common Supersequences (SCS) represent different aspects of
these sequences profile, and they can all be used for biological sequence
comparisons and analysis. Revealing the relationship between the patterns and
LCS,SCS might provide us with a deeper view of the patterns of biological
sequences, in turn leading to better understanding of them. However, There is
no careful examinaton about the relationship between patterns, LCS and SCS. In
this paper, we have analyzed their relation, and given some lemmas. Based on
their relations, a set of algorithms called the PALS (PAtterns by Lcs and Scs)
algorithms are propsoed to discover patterns in a set of biological sequences.
These algorithms first generate the results for LCS and SCS of sequences by
heuristic, and consequently derive patterns from these results. Experiments
show that the PALS algorithms perform well (both in efficiency and in accuracy)
on a variety of sequences. The PALS approach also provides us with a solution
for transforming between the heuristic results of SCS and LCS.Comment: Extended version of paper presented in IEEE BIBE 2006 submitted to
journal for revie
Examination of the relationship between essential genes in PPI network and hub proteins in reverse nearest neighbor topology
Abstract Background In many protein-protein interaction (PPI) networks, densely connected hub proteins are more likely to be essential proteins. This is referred to as the "centrality-lethality rule", which indicates that the topological placement of a protein in PPI network is connected with its biological essentiality. Though such connections are observed in many PPI networks, the underlying topological properties for these connections are not yet clearly understood. Some suggested putative connections are the involvement of essential proteins in the maintenance of overall network connections, or that they play a role in essential protein clusters. In this work, we have attempted to examine the placement of essential proteins and the network topology from a different perspective by determining the correlation of protein essentiality and reverse nearest neighbor topology (RNN). Results The RNN topology is a weighted directed graph derived from PPI network, and it is a natural representation of the topological dependences between proteins within the PPI network. Similar to the original PPI network, we have observed that essential proteins tend to be hub proteins in RNN topology. Additionally, essential genes are enriched in clusters containing many hub proteins in RNN topology (RNN protein clusters). Based on these two properties of essential genes in RNN topology, we have proposed a new measure; the RNN cluster centrality. Results from a variety of PPI networks demonstrate that RNN cluster centrality outperforms other centrality measures with regard to the proportion of selected proteins that are essential proteins. We also investigated the biological importance of RNN clusters. Conclusions This study reveals that RNN cluster centrality provides the best correlation of protein essentiality and placement of proteins in PPI network. Additionally, merged RNN clusters were found to be topologically important in that essential proteins are significantly enriched in RNN clusters, and biologically important because they play an important role in many Gene Ontology (GO) processes.http://deepblue.lib.umich.edu/bitstream/2027.42/78257/1/1471-2105-11-505.xmlhttp://deepblue.lib.umich.edu/bitstream/2027.42/78257/2/1471-2105-11-505-S1.DOChttp://deepblue.lib.umich.edu/bitstream/2027.42/78257/3/1471-2105-11-505.pdfPeer Reviewe
A post-processing method for optimizing synthesis strategy for oligonucleotide microarrays
The broad applicability of gene expression profiling to genomic analyses has generated huge demand for mass production of microarrays and hence for improving the cost effectiveness of microarray fabrication. We developed a post-processing method for deriving a good synthesis strategy. In this paper, we assessed all the known efficient methods and our post-processing method for reducing the number of synthesis cycles for manufacturing a DNA-chip of a given set of oligos. Our experimental results on both simulated and 52 real datasets show that no single method consistently gives the best synthesis strategy, and post-processing an existing strategy is necessary as it often reduces the number of synthesis cycles further
Container solutions for HPC Systems: A Case Study of Using Shifter on Blue Waters
Software container solutions have revolutionized application development
approaches by enabling lightweight platform abstractions within the so-called
"containers." Several solutions are being actively developed in attempts to
bring the benefits of containers to high-performance computing systems with
their stringent security demands on the one hand and fundamental resource
sharing requirements on the other.
In this paper, we discuss the benefits and short-comings of such solutions
when deployed on real HPC systems and applied to production scientific
applications.We highlight use cases that are either enabled by or significantly
benefit from such solutions. We discuss the efforts by HPC system
administrators and support staff to support users of these type of workloads on
HPC systems not initially designed with these workloads in mind focusing on
NCSA's Blue Waters system.Comment: 8 pages, 7 figures, in PEARC '18: Proceedings of Practice and
Experience in Advanced Research Computing, July 22--26, 2018, Pittsburgh, PA,
US
BOSS-LDG: A Novel Computational Framework that Brings Together Blue Waters, Open Science Grid, Shifter and the LIGO Data Grid to Accelerate Gravitational Wave Discovery
We present a novel computational framework that connects Blue Waters, the
NSF-supported, leadership-class supercomputer operated by NCSA, to the Laser
Interferometer Gravitational-Wave Observatory (LIGO) Data Grid via Open Science
Grid technology. To enable this computational infrastructure, we configured,
for the first time, a LIGO Data Grid Tier-1 Center that can submit
heterogeneous LIGO workflows using Open Science Grid facilities. In order to
enable a seamless connection between the LIGO Data Grid and Blue Waters via
Open Science Grid, we utilize Shifter to containerize LIGO's workflow software.
This work represents the first time Open Science Grid, Shifter, and Blue Waters
are unified to tackle a scientific problem and, in particular, it is the first
time a framework of this nature is used in the context of large scale
gravitational wave data analysis. This new framework has been used in the last
several weeks of LIGO's second discovery campaign to run the most
computationally demanding gravitational wave search workflows on Blue Waters,
and accelerate discovery in the emergent field of gravitational wave
astrophysics. We discuss the implications of this novel framework for a wider
ecosystem of Higher Performance Computing users.Comment: 10 pages, 10 figures. Accepted as a Full Research Paper to the 13th
IEEE International Conference on eScienc
Jerantinine A induces tumor-specific cell death through modulation of splicing factor 3b subunit 1 (SF3B1)
Precursor mRNA (pre-mRNA) splicing is catalyzed by a large ribonucleoprotein complex known as the spliceosome. Numerous studies have indicated that aberrant splicing patterns or mutations in spliceosome components, including the splicing factor 3b subunit 1 (SF3B1), are associated with hallmark cancer phenotypes. This has led to the identification and development of small molecules with spliceosome-modulating activity as potential anticancer agents. Jerantinine A (JA) is a novel indole alkaloid which displays potent anti-proliferative activities against human cancer cell lines by inhibiting tubulin polymerization and inducing G2/M cell cycle arrest. Using a combined pooled-genome wide shRNA library screen and global proteomic profiling, we showed that JA targets the spliceosome by up-regulating SF3B1 and SF3B3 protein in breast cancer cells. Notably, JA induced significant tumor-specific cell death and a significant increase in unspliced pre-mRNAs. In contrast, depletion of endogenous SF3B1 abrogated the apoptotic effects, but not the G2/M cell cycle arrest induced by JA. Further analyses showed that JA stabilizes endogenous SF3B1 protein in breast cancer cells and induced dissociation of the protein from the nucleosome complex. Together, these results demonstrate that JA exerts its antitumor activity by targeting SF3B1 and SF3B3 in addition to its reported targeting of tubulin polymerization
Towards a better solution to the shortest common supersequence problem: the deposition and reduction algorithm
BACKGROUND: The problem of finding a Shortest Common Supersequence (SCS) of a set of sequences is an important problem with applications in many areas. It is a key problem in biological sequences analysis. The SCS problem is well-known to be NP-complete. Many heuristic algorithms have been proposed. Some heuristics work well on a few long sequences (as in sequence comparison applications); others work well on many short sequences (as in oligo-array synthesis). Unfortunately, most do not work well on large SCS instances where there are many, long sequences. RESULTS: In this paper, we present a Deposition and Reduction (DR) algorithm for solving large SCS instances of biological sequences. There are two processes in our DR algorithm: deposition process, and reduction process. The deposition process is responsible for generating a small set of common supersequences; and the reduction process shortens these common supersequences by removing some characters while preserving the common supersequence property. Our evaluation on simulated data and real DNA and protein sequences show that our algorithm consistently produces the best results compared to many well-known heuristic algorithms, and especially on large instances. CONCLUSION: Our DR algorithm provides a partial answer to the open problem of designing efficient heuristic algorithm for SCS problem on many long sequences. Our algorithm has a bounded approximation ratio. The algorithm is efficient, both in running time and space complexity and our evaluation shows that it is practical even for SCS problems on many long sequences
- …